Reverse Product-Scanning Multiplication and Squaring on 8-Bit AVR Processors
نویسندگان
چکیده
High performance, small code size, and good scalability are important requirements for software implementations of multi-precision arithmetic algorithms to fit resource-limited embedded systems. In this paper, we describe optimization techniques to speed up multi-precision multiplication and squaring on the AVR ATmega series of 8-bit microcontrollers. First, we present a new approach to perform multi-precision multiplication, called Reverse Product Scanning (RPS), that resembles the hybrid technique of Gura et al., but calculates the byte-products in the inner loop in reverse order. The RPS method processes four bytes of the two operands in each iteration of the inner loop and employs two carry-catcher registers to minimize the number of add instructions. We also describe an optimized algorithm for multi-precision squaring based on the RPS technique that is, depending on the operand length, up to 44.3% faster than multiplication. Our AVR Assembly implementations of RPS multiplication and RPS squaring occupy less than 1 kB of code space each and are written in a parameterized fashion so that they can support operands of varying length without recompilation. Despite this high level of flexibility, our RPS multiplication outperforms the looped variant of Hutter et al.’s operand-caching technique and saves between 40 and 51% of code size. We also combine our RPS multiplication and squaring routines with Karatsuba’s method to further reduce execution time. When executed on an ATmega128 processor, the “karatsubarized RPS method” needs only 85 k clock cycles for a 1024-bit multiplication (or 48 k cycles for a squaring). These results show that it is possible to achieve high performance without sacrificing code size or scalability.
منابع مشابه
Optimized Karatsuba squaring on 8-bit AVR processors
Abstract. Multi-precision squaring is a crucial operation for implementation of Elliptic Curve Cryptography. Particularly, when it comes to embedded processors, the operation should be designed carefully to execute expensive ECC operation on resource constrained devices. In order to bridge the gap between high overheads and limited computation capabilities, we present optimized Karatsuba squari...
متن کاملNew Speed Records for Montgomery Modular Multiplication on 8-Bit AVR Microcontrollers
Modular multiplication of large integers is a performancecritical arithmetic operation of many public-key cryptosystems such as RSA, DSA, Diffie-Hellman (DH) and their elliptic curve-based variants ECDSA and ECDH. The computational cost of modular multiplication and related operations (e.g. exponentiation) poses a practical challenge to the widespread deployment of public-key cryptography, espe...
متن کاملLow-Weight Primes for Lightweight Elliptic Curve Cryptography on 8-bit AVR Processors
Small 8-bit RISC processors and micro-controllers based on the AVR instruction set architecture are widely used in the embedded domain with applications ranging from smartcards over control systems to wireless sensor nodes. Many of these applications require asymmetric encryption or authentication, which has spurred a body of research into implementation aspects of Elliptic Curve Cryptography (...
متن کاملEfficient Implementation of NIST-Compliant Elliptic Curve Cryptography for Sensor Nodes
In this paper, we present a highly-optimized implementation of standards-compliant Elliptic Curve Cryptography (ECC) for wireless sensor nodes and similar devices featuring an 8-bit AVR processor. The field arithmetic is written in Assembly language and optimized for the 192-bit NIST-specified prime p = 2 − 2 − 1, while the group arithmetic (i.e. point addition and doubling) is programmed in AN...
متن کاملEfficient Ring-LWE Encryption on 8-Bit AVR Processors
Public-key cryptography based on the “ring-variant” of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic ...
متن کامل